Data Visualization and Analysis for the Kronos Incident Investigation Using R

VAST Challenge 2021 Mini-Challenge 2.

DONG Fang (Regina) https://www.linkedin.com/in/reginadongf/ (School of Computing and Information Systems, Singapore Management University)https://scis.smu.edu.sg/
07-17-2021

1. Background

1.1 Case overview

GAStech is a Tethys-based company having been operating a natural gas production site in the island country of Kronos for over 20 years. It has produced remarkable profits and developed strong relationships with the government of Kronos, but has not been as successful in demonstrating environmental stewardship.

In January, 2014, the leaders of GAStech are celebrating their new-found fortune as a result of the initial public offering of their very successful company. In the midst of this celebration, several employees of GAStech go missing. An organization known as the Protectors of Kronos (POK) is suspected in the disappearance, but things may not be what they seem.

This case is designed aim at helping the law enforcement from Kronos and Tethys investigate the incident by using data visualization techniques. There are 3 challenges in VAST Challenge 2021 focusing on different aspects of case analysis. In this report we concentrated on visualization and analysis for Mini-Challenge 2.

1.2 Requirement for Mini-Challenge 2

Many of the Abila, Kronos-based employees of GAStech have company cars which are approved for both personal and business use. Those who do not have company cars have the ability to check out company trucks for business use, but these trucks cannot be used for personal business. The vehicles are installed with GPS tracked periodically as long as they are moving. Besides, in order to promote local businesses, Kronos based companies provide a Kronos Kares benefit card to GASTech employees giving them discounts and rewards in exchange for collecting information about their credit card purchases and preferences as recorded on loyalty cards.

Now the vehicle tracking data for the two weeks prior to the incident, car assignment list, transaction records in credit card and loyal card are available for analyzing.

The challenges to be dealt with are listed below:

No. Question
1 Using just the credit and loyalty card data, identify the most popular locations, and when they are popular. What anomalies do you see? What corrections would you recommend to correct these anomalies?
2 Add the vehicle data to your analysis of the credit and loyalty card data. How does your assessment of the anomalies in question 1 change based on this new data? What discrepancies between vehicle, credit, and loyalty card data do you find?
3 Can you infer the owners of each credit card and loyalty card? What is your evidence? Where are there uncertainties in your method? Where are there uncertainties in the data?
4 Given the data sources provided, identify potential informal or unofficial relationships among GASTech personnel. Provide evidence for these relationships.
5 Do you see evidence of suspicious activity? Identify 1- 10 locations where you believe the suspicious activity is occurring, and why.

The detailed information and all the data needed for Mini-challenge 2 is available in VAST Challenge 2021 official website.

2. Data Preparation

2.1 Data description

The dataset used for Mini-Challenge 2 includes 4 CSV files, a package of ESRI shapefiles of Abila and Kronos, and a tourist map of Abila in JPEG format, as shown in the following screenshot.


Fig.1 Dataset for visualization and analysis

The data contents in the CSV files are listed below:

File Description Data Content
car-assignments.csv A list of vehicle assignments by employee Employee Last Name
Employee First Name
Car ID
Current Employment Type (Department)
Current Employment Title (job title)
gps.csv vehicle tracking data Timestamp
Car ID (integer)
Latitude
Longitude
cc_data.csv credit and debit card transaction data Timestamp
Location (name of the business)
Price (real)
Last 4 digits of the credit or debit card number
loyalty_data.csv loyalty card transaction data Timestamp
Location (name of the business)
Price (real)
Loyalty Number (A 5-character code starting with L that is unique for each card)

2.2 Steps for data preparation

2.2.1 Installing and launching R Packages

We used R studio as the tool to import, process, visualize and analyze the data.

The first thing is run this line of code to clear the environment and remove existing R objects (if any).

rm(list=ls())

The code chunk below is used to install and launch the packages necessary for next steps.

packages = c('ggiraph', 'plotly','DT', 'patchwork', 
             'raster', 'sf','tmap', 'mapview','gifski',
             'tidyverse', 'mlr','lubridate')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

2.2.2 Importing relevent data

First of all, below code chunk is used to identify the encoding type of the CSV file to make sure no garbled characters in the imported data.

guess_encoding("data/car-assignments.csv")
# A tibble: 1 x 2
  encoding confidence
  <chr>         <dbl>
1 ASCII             1
guess_encoding("data/gps.csv")
# A tibble: 1 x 2
  encoding confidence
  <chr>         <dbl>
1 ASCII             1
guess_encoding("data/cc_data.csv")
# A tibble: 2 x 2
  encoding     confidence
  <chr>             <dbl>
1 windows-1252       0.41
2 windows-1254       0.25
guess_encoding("data/loyalty_data.csv")
# A tibble: 2 x 2
  encoding     confidence
  <chr>             <dbl>
1 windows-1254       0.26
2 windows-1252       0.24

According to above results, “windows-1254” would be set as the encoding for cc_data.csv and loyalty_data.csv when importing the file, using read_csv() function in tidyverse package.

car_ass <- read_csv("data/car-assignments.csv")
gps <- read_csv("data/gps.csv")
cc <- read_csv("data/cc_data.csv", locale = locale(encoding = "windows-1252"))
loyalty <- read_csv("data/loyalty_data.csv", locale = locale(encoding = "windows-1252"))

2.2.3 Converting data types

As shown below, we need to check if the data type is proper in the imported tibble data table. It’s obvious that the Timestamp in gps.csv, cc_data.csv and loyalty_data.csv should be in datetime format but now it’s in character format. Besides, CarID in car-assignments.csv, id in gps.csv and last4ccnum in cc_data.csv should be converted from numerical data to categorical data.

Rows: 44
Columns: 5
$ LastName               <chr> "Calixto", "Azada", "Balas", "Barranc~
$ FirstName              <chr> "Nils", "Lars", "Felix", "Ingrid", "I~
$ CarID                  <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12~
$ CurrentEmploymentType  <chr> "Information Technology", "Engineerin~
$ CurrentEmploymentTitle <chr> "IT Helpdesk", "Engineer", "Engineer"~
Rows: 685,169
Columns: 4
$ Timestamp <chr> "01/06/2014 06:28:01", "01/06/2014 06:28:01", "01/~
$ id        <dbl> 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35~
$ lat       <dbl> 36.07623, 36.07622, 36.07621, 36.07622, 36.07621, ~
$ long      <dbl> 24.87469, 24.87460, 24.87444, 24.87425, 24.87417, ~
Rows: 1,490
Columns: 4
$ timestamp  <chr> "01/06/2014 07:28", "01/06/2014 07:34", "01/06/20~
$ location   <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price      <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <dbl> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
Rows: 1,392
Columns: 4
$ timestamp  <chr> "01/06/2014", "01/06/2014", "01/06/2014", "01/06/~
$ location   <chr> "Brew've Been Served", "Brew've Been Served", "Ha~
$ price      <dbl> 4.17, 9.60, 16.53, 11.51, 12.93, 4.27, 11.20, 15.~
$ loyaltynum <chr> "L2247", "L9406", "L8328", "L6417", "L1107", "L40~

To achieve this, mdy_hms() and mdy() functions in lubridate package are used to covert the data type to datetime, and as.character() function is used to convert data as characters.

gps$Timestamp = mdy_hms(gps$Timestamp)
cc$timestamp = mdy_hm(cc$timestamp)
loyalty$timestamp = mdy(loyalty$timestamp)

car_ass$CarID = as.character(car_ass$CarID)
gps$id = as.character(gps$id)
cc$last4ccnum = as.character(cc$last4ccnum)

Since the transaction date in credit and loyalty card data are all in January, the date of month, weekday, hour of time can be derived from timestamp and displayed in different columns of cc_data.csv and loyalty_data.csv. The same as GPS tracking data. As shown in below code chunk, day() function is used to get the date, wday() to get the weekday, hour() to get the hour of time.

cc$day = day(cc$timestamp)
cc$weekday = wday(cc$timestamp, label = T, abbr = T)
cc$hour = hour(cc$timestamp)
loyalty$day = day(loyalty$timestamp)
loyalty$weekday = wday(loyalty$timestamp, label = T, abbr = T)
gps$day = as.factor(day(gps$Timestamp))
gps$weekday = wday(gps$Timestamp, label = T, abbr = T)
gps$hour = as.factor(hour(gps$Timestamp))

2.2.4 Exploring and cleaning data

Then we do some exploration for the data and check the missing values by using the code chunks below. Only CarID in car-assignments.csv has 9 missing values.

knitr::kable(summarizeColumns(car_ass), caption = "EDA for Car Assigment Data", digits = 2)
Table 1: EDA for Car Assigment Data
name type na mean disp median mad min max nlevs
LastName character 0 NA 0.95 NA NA 1 2 38
FirstName character 0 NA 0.95 NA NA 1 2 43
CarID character 9 NA NA NA NA 1 1 35
CurrentEmploymentType character 0 NA 0.70 NA NA 5 13 5
CurrentEmploymentTitle character 0 NA 0.80 NA NA 1 9 21
knitr::kable(summarizeColumns(gps), caption = "EDA for GPS Tracking Data", digits = 2)
Table 2: EDA for GPS Tracking Data
name type na mean disp median mad min max nlevs
Timestamp POSIXct 0 NA NA NA NA 1.00 22.00 303206
id character 0 NA 0.96 NA NA 2317.00 24713.00 40
lat numeric 0 36.06 0.01 36.06 0.01 36.05 36.09 0
long numeric 0 24.88 0.01 24.88 0.01 24.83 24.91 0
day factor 0 NA 0.88 NA NA 12208.00 82786.00 14
weekday ordered 0 NA 0.79 NA NA 28829.00 142148.00 7
hour factor 0 NA 0.85 NA NA 182.00 106146.00 21
knitr::kable(summarizeColumns(cc), caption = "EDA for Credit Card Transaction Data", digits = 2)
Table 3: EDA for Credit Card Transaction Data
name type na mean disp median mad min max nlevs
timestamp POSIXct 0 NA NA NA NA 1.00 16 1116
location character 0 NA 0.86 NA NA 1.00 212 34
price numeric 0 207.70 740.86 28.24 24.62 2.01 10000 0
last4ccnum character 0 NA 0.98 NA NA 4.00 37 55
day integer 0 11.99 3.95 12.00 5.93 6.00 19 0
weekday ordered 0 NA 0.82 NA NA 104.00 264 7
hour integer 0 13.86 4.56 13.00 7.41 3.00 22 0
knitr::kable(summarizeColumns(loyalty), caption = "EDA for Loyalty Card Transaction Data", digits = 2)
Table 4: EDA for Loyalty Card Transaction Data
name type na mean disp median mad min max nlevs
timestamp Date 0 NA NA NA NA 43 123.00 14
location character 0 NA 0.86 NA NA 1 195.00 33
price numeric 0 204.33 719.01 22.84 16.84 3 4983.52 0
loyaltynum character 0 NA 0.96 NA NA 3 55.00 54
day integer 0 12.03 3.95 13.00 5.93 6 19.00 0
weekday ordered 0 NA 0.82 NA NA 97 245.00 7

The records containing missing values in car-assignment.csv are shown below. We can see these records are all company trucks not for personal use. As the number of missing values are not large and the missed fileds not quite important, no need to clean or exclude these records.

Table 5: The Missing Values in Car Assignment Dataset
LastName FirstName CarID CurrentEmploymentType CurrentEmploymentTitle
Hafon Albina NA Facilities Truck Driver
Hawelon Benito NA Facilities Truck Driver
Hawelon Claudio NA Facilities Truck Driver
Mies Henk NA Facilities Truck Driver
Morlun Valeria NA Facilities Truck Driver
Morlun Adan NA Facilities Truck Driver
Morluniau Cecilia NA Facilities Truck Driver
Nant Irene NA Facilities Truck Driver
Scozzese Dylan NA Facilities Truck Driver

3. Data Visualization

3.1 Histogram showing frequecy of transactions

First of all, a 2d histogram for credit card transaction frequency with location by hour was built by below code chunk. A slider is added to select range of days as the filtering criterion, and a data table is linked to the graph to show details related to the selections.

d <- highlight_key(cc)

# Plot the 2d histogram for credit card
gra_1 <- plot_ly(data = d, x = ~as.factor(hour), y = ~location,
                 hovertemplate = paste(
                   " %{yaxis.title.text}: %{y}<br>",
                   "%{xaxis.title.text}: %{x}<br>",
                   "Transaction Count: %{z}",
                   "<extra></extra>")) %>%
  add_histogram2d(colors = "Blues") %>%
  layout(title = "<b>Graph.1 Credit Card Transcation Frequency by Hour</b>",
         xaxis = list(title = "Time", tickmode = "linear"),
         yaxis = list(title="Location", tickmode = "linear")
         )

# Add a slider to the graph to select the range of date, and
#  link a data table to show details
crosstalk::bscols(
                  crosstalk::filter_slider("day", "Date of Jan", 
                                           d, ~day, step = 1, 
                                           animate = T, ticks = F),
                  gra_1,
                  DT::datatable(d, filter=c("top"), class = "hover",
                                options = list(pageLength = 5,
                                               columnDefs = list(
                                                 list(visible = FALSE,
                                                      targets = c(5, 7)))
                                               )),
                  widths = 10)

Then the 2d histograms for credit card and loyalty card transaction frequency with location by day were created as below. Added a dropdown list to select the card number as filtering criteria to show the transaction frequency of specific card owner along days.

d1 = highlight_key(cc)
d2 = highlight_key(loyalty)

# Plot the 2d histogram of credit card transaction frequency
gra_2.1 <- plot_ly(data = d1, x = ~as.factor(day), y = ~location,
                 hovertemplate = paste(
                   " %{yaxis.title.text}: %{y}<br>",
                   "%{xaxis.title.text}: %{x}<br>",
                   "Transaction Count: %{z}",
                   "<extra></extra>")) %>%
  add_histogram2d(colors = "Blues") %>%
  layout(title = "<b>Graph.2-1 Credit Card Transaction Frequency by Day</b>",
         #annotations = list(text = "Credit Card", showarrow = F, x =10, y=32),
         xaxis = list(title = "Date of Jan", tickmode = "linear"),
         yaxis = list(title = "Location", tickmode = "linear")
         )

# Plot the 2d histogram of loyalty card transaction frequency
gra_2.2 <- plot_ly(data = d2, x = ~as.factor(day), y = ~location,
                 hovertemplate = paste(
                   " Location: %{y}<br>",
                   "Date of Jan: %{x}<br>",
                   "Transaction Count: %{z}",
                   "<extra></extra>")) %>%
  add_histogram2d(colors = "Greys") %>%
  layout(title = "<b>Graph.2-2 Loyalty Card Transaction Frequency by Day</b>",
         #annotations = list(text = "Loyalty Card", showarrow = F, x =10, y=32),
         xaxis = list(title = "Date of Jan", tickmode = "linear"),
         yaxis = list(title = "Location", tickmode = "linear", visible = T)
         )

# Add a dropdown list to the graph to filter the card number
gra_2.1_c <- crosstalk::bscols(crosstalk::filter_select(
                                 "ccnum", 
                                 "Choose last 4 credit card Number",
                                 d1, ~last4ccnum,
                                 multiple = T),
                               gra_2.1,
                               widths = 10)
gra_2.2_c <- crosstalk::bscols(crosstalk::filter_select(
                                 "lonum", 
                                 "Choose loyalty card number",
                                 d2, ~loyaltynum,
                                 multiple = T),
                               gra_2.2,
                               widths = 10)

gra_2.1_c
gra_2.2_c

3.2 Geographical graph showing movement path

Now take the GPS tracking data into account, it’s necessary to draw movement path on the tourist map with the GPS tracking data, so that we can see where the employees have gone and gathered together during the two weeks before the disappearance.

The first thing to do is plotting Raster Layer of the tourist map of Abila, Kronos, as the background map, and import Abila GIS data layer.

bgmap <- raster("data/MC2-tourist.jpg")
bgmap
class      : RasterLayer 
band       : 1  (of  3  bands)
dimensions : 1535, 2740, 4205900  (nrow, ncol, ncell)
resolution : 1, 1  (x, y)
extent     : 0, 2740, 0, 1535  (xmin, xmax, ymin, ymax)
crs        : NA 
source     : MC2-tourist.jpg 
names      : MC2.tourist 
values     : 0, 255  (min, max)
Abila_st <- st_read(dsn = "data/Geospatial", layer = "Abila")
Reading layer `Abila' from data source 
  `D:\ReginaDong\DataViz_blog\_posts\2021-07-17-assignvastchallenge\data\Geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 3290 features and 9 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 24.82401 ymin: 36.04502 xmax: 24.90997 ymax: 36.09492
Geodetic CRS:  WGS 84

According to the result of bgmap, the extent of bound is (0, 2740, 0, 1535) for (xmin, xmax, ymin, ymax), while the bounding box of Abila_st is (24.82401, 24.90997, 36.04502, 36.09492) for (xmin, xmax, ymin, ymax). So it’s necessary to reset the coordinate bounding of bgmap according to Abila_st, or the GPS tracks won’t be matched and shown on the background map normally. Below code chunk is for setting the extreme coordinates of bgmap.

xmin(bgmap) = 24.82401
xmax(bgmap) = 24.90997
ymin(bgmap) = 36.04502
ymax(bgmap) = 36.09492

The code chunk below is used to convert GPS spatial data into a Simple Feature (SF) data frame.

gps_sf <- st_as_sf(gps, 
                   coords = c("long", "lat"),
                   crs = 4326)

Then before combining the background map and the GPS tracking lines to generate the movement path, the spatial data need to be grouped by id, day and hour respectively.

# Group by id and day
gps_path <- gps_sf %>%
  group_by(id, day) %>%
  summarize(m = mean(Timestamp), 
            do_union=FALSE) %>%
  st_cast("LINESTRING")

np = npts(gps_path, by_feature = T)
gps_path2 <- cbind(gps_path, np) %>%
  filter(np > 1) # exclude orphan coordinate records

# Group by id and hour
gps_hour <- gps_sf %>%
  group_by(id, hour) %>%
  summarise(m = mean(Timestamp),
            do_union = FALSE) %>%
  st_cast("LINESTRING")

Set day as filtering criteria by using filter() function, and differentiate the line colors by id through setting col argument of tm_lines() function, so that the geographical graph could show the movement track of all cars in specific date. Below code chunk is used to create the graph.

# Filter GPS spatial data by date of Jan
gps_path_selected <- gps_path2 %>%
  filter(day=="6") 

# Plot the moving path
tmap_mode("view")
gra_3 <- tm_shape(bgmap) +
  tm_rgb(bgmap, r = 1,g = 2,b = 3,
       alpha = NA,
       saturation = 1,
       interpolate = TRUE,
       max.value = 255) +
  tm_shape(gps_path_selected) +
  tm_lines(col =  "id", palette = "Dark2") +
  tmap_options(max.categories = 44) 

gra_3

Graph.3 GPS Moving Route in A Specific Day

Set hour as filtering criteria by using filter() function, and differentiate the line colors by id through setting col argument of tm_lines() function, so that the geographical graph could show the movement track of all cars in specific hour of days. Below code chunk is used to create the graph.

# Filter GPS spatial data by hour of time
gps_hour_selected <- gps_hour %>%
  filter(hour=="7")

# Plot the moving path
tmap_mode("view")
gra_4 <- tm_shape(bgmap) +
  tm_rgb(bgmap, r = 1,g = 2,b = 3,
       alpha = NA,
       saturation = 1,
       interpolate = TRUE,
       max.value = 255) +
  tm_shape(gps_hour_selected) +
  tm_lines(col = "id", palette = "Dark2") +
  tmap_options(max.categories = 44)

gra_4

Graph.4 GPS Moving Route of in A Specific Hour

3.3 Line chart of average price by weekday

In order to compare the money spent in different locations, a line chart showing average price spent in credit card and loyal card in weekdays and weekends were created by below code chunk.

In the code chunk, first calculate the average price grouped by weekday and location using group_by() and summarize() function, then combine them in one table with rbind() function, which is used to plot the line chart using functions of plotly packages. A dropdown list is added to select the location to be filtered.

# Calculate the average price by weekday and location, then combine the results
mean_price <- cc %>%
  group_by(weekday, location) %>%
  summarize(Avg.Price = mean(price)) %>%
  ungroup() %>%
  mutate(card = "Credit Card") %>%
  rbind(
    loyalty %>%
      group_by(weekday, location) %>%
      summarize(Avg.Price = mean(price)) %>%
      ungroup() %>%
      mutate(card = "Loyalty Card")
  )

# Plot the line chart
d <- highlight_key(mean_price)
gra_5 <- plot_ly(data = d, x = ~weekday, y= ~Avg.Price, 
                 color = ~card, colors = "Paired", 
                 linetype = ~card,
                 type = 'scatter', mode = 'lines+markers') %>%
  layout(title = "<b>Graph.5 Average Transaction Price by Weekday</b>")

# Add a dropdown list to select one location
crosstalk::bscols(crosstalk::filter_select("loc", "Choose a location first",
                                           d, ~location, multiple = F),
                  gra_5,
                  widths = 10)

3.4 Boxplot for transaction price by location

As the line chart above can’t vividly show the comparison of transaction price between locations nor the outliers among them, a box plot is created by below code chunk, where the first step is to combine cc and loyalty dataset as the plotting data source, then using plot_ly() function to generate the graph. A dropdown list is added to select the weekday to be filtered, and a data table is linked to the boxplot to show details related to the selections.

# Combine cc and loyalty data
cards <- cc %>%
  select(-hour) %>%
  rename(cardnum = last4ccnum) %>%
  mutate(card = "Credit Card") %>%
  rbind(loyalty %>%
    rename(cardnum = loyaltynum) %>%
      mutate(card = "Loyalty Card"))

# Generate the box plot
d <- highlight_key(cards)
gra_6 <- plot_ly(data = d,x = ~location, y= ~price, 
                 color = ~card, colors = "Paired",
                 type = 'box', boxmean = T) %>%
  layout(title = "<b>Graph.6 Box Plot of Transaction Price by Location</b>",
         boxmode = "group")

# Add a dropdown list to filter the weekday, and
#  link a data table to show details
crosstalk::bscols(
                  crosstalk::filter_select("wdy", "Choose the weekday",
                                             d, ~weekday, multiple = T),
                  gra_6,
                  DT::datatable(d, filter=c("top"), class = "hover",
                                options = list(pageLength = 5,
                                               columnDefs = list(
                                                 list(visible = FALSE,
                                                      targets = c(5)))
                                               )),
                  widths = 22)

4. Data Analysis and Insights

4.1 Anlysis for question 1


Fig.2 The total transaction frequency of credit card by hour and date

Similarly, if see the transaction frequency of loyalty card by date as shown below, Katerina’s Cafe, Hippokampos, Hallowed Grounds, Guy’s Gyros and Brew’ve Been Served still hold the darkest color which means lots of people made purchase at these places.


Fig.3 The total transaction frequency of loyalty card by date

4.2 Anlysis for question 2

5. Conclusion

Backups

#bgmap <- raster("data/Geospatial/MC2-tourist.tif")
bgmap <- raster("data/MC2-tourist.jpg")
#res(bgmap) <- c(3.16216e-05, 3.16216e-05)
#crs(bgmap) <- "+proj=longlat +datum=WGS84"
xmin(bgmap) = 24.82419
xmax(bgmap) = 24.90976
ymin(bgmap) = 36.04499
ymax(bgmap) = 36.09543

bgmap
#Abila_st <- st_read(dsn = "data/Geospatial", layer = "Abila")
#Abila_st
#gra_2 <- subplot(gra_2.1, gra_2.2, nrows = 1, shareY = T) %>%
#  layout(title = "<b>Graph.2 Transaction Frequency by Day</b>",
#         xaxis = list(title = "Date of Jan"),
#         xaxis2 = list(title = "Date of Jan"),
#         yaxis = list(title = "Location"),
#         autosize = F, width = 800, width2 = 800, height = 400
#         )

tm_layout(title = “Graph.3 GPS Moving Route in A Specific Day”)

tm_layout(title = “Graph.4 GPS Moving Route of in A Specific Hour”)

bg_jpg <- image_read("data/MC2-tourist.jpg")
bg_tif <- image_convert(bg_jpg, "tif")
image_info(bg_tif)
image_info(bg_jpg)
image_write(bg_tif, "data/MC2-tourist.tif")
# Backup
gps_c <- dplyr::select(gps, lat, long)
segment = traclus(gps_c, 36, 5)
# Calculate the average price by weekday and location
cc_mean <- cc %>%
  group_by(weekday, location) %>%
  summarize(Avg.Price = mean(price)) %>%
  ungroup() %>%
  mutate(card = "Credit Card")

loy_mean <- loyalty %>%
  group_by(weekday, location) %>%
  summarize(Avg.Price = mean(price)) %>%
  ungroup() %>%
  mutate(card = "Loyalty Card")

# Combine the data table
mean_price <- rbind(cc_mean, loy_mean)

# Plot the line chart
d <- highlight_key(mean_price)
gra_5 <- plot_ly(data = d, x = ~weekday, y= ~Avg.Price, 
                 color = ~card, colors = "Paired", 
                 linetype = ~card,
                 type = 'scatter', mode = 'lines+markers') %>%
  layout(title = "<b>Graph.5 Average Transaction Price by Weekday</b>")

# Add a dropdown list to select one location
crosstalk::bscols(crosstalk::filter_select("loc", "Choose a location first",
                                           d, ~location, multiple = F),
                  gra_5,
                  widths = 10)
cc_mean <- cc %>%
  group_by(weekday, location) %>%
  summarize(Avg.Price = mean(price)) %>%
  ungroup() %>%
  mutate(card = "Credit Card")

loy_mean <- loyalty %>%
  group_by(weekday, location) %>%
  summarize(Avg.Price = mean(price)) %>%
  ungroup() %>%
  mutate(card = "Loyalty Card")

mean_price <- rbind(cc_mean, loy_mean)

gra_5.1 <- mean_price %>%
  ggplot(aes(weekday, Avg.Price, colour = card)) +
  geom_line() +
  geom_point(size = 1) +
  labs(title = "<b>Graph.5 Average Transaction Price by Weekday</b>") +
  facet_wrap(~location, ncol = 4, scales = "free_y") +
  theme_light() + 
  #scale_color_brewer(palette = "Dark2") +
  scale_y_continuous(name = "Average Price")

gra_5.1 <- ggplotly(gra_5.1) %>%
  layout(autosize = F, width = 1000, height = 2000)

gra_5.1
# Combine cc and loyalty data
cards <- cc %>%
  select(-hour) %>%
  rename(cardnum = last4ccnum) %>%
  mutate(card = "Credit Card") %>%
  rbind(loyalty %>%
    rename(cardnum = loyaltynum) %>%
      mutate(card = "Loyalty Card"))

# Generate the box plot
d <- highlight_key(cards)
gra_6 <- plot_ly(data = d,x = ~location, y= ~price, 
                 color = ~card, colors = "Dark2",
                 type = 'box', boxmean = T) %>%
  layout(title = "<b>Graph.6 Box Plot of Transaction Price by Location</b>",
         boxmode = "group")

# Add a dropdown list to filter the weekday
crosstalk::bscols(gra_6,
                  crosstalk::filter_select("wdy", "Choose the weekday",
                                             d, ~weekday, multiple = T),
                  widths = 22)